Skip to content

spec-6.0a production shared-storage backend matrix#12

Merged
sqlrush merged 17 commits into
mainfrom
spec-6.0a-production-storage
Jul 1, 2026
Merged

spec-6.0a production shared-storage backend matrix#12
sqlrush merged 17 commits into
mainfrom
spec-6.0a-production-storage

Conversation

@sqlrush

@sqlrush sqlrush commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Status

Spec-6.0a early implementation / PR-ready storage backend work. This PR is ready for review and CI exposure, but remains merge-blocked: do not merge until spec-5.19 + 5.21 beta close-out, Stage 6 entry, and a final D0 re-ground against current origin/main are complete.

Not shipped. Not a release candidate. No tag or release flow is part of this PR.

Before merge: rebase to then-current origin/main, redo final D0 re-ground, rerun required local/CI gates, and refresh this evidence.

Completed

  • Added provider capability framework via ClusterSharedFsCaps, durability/fence capability hooks, barrier_sync, and fence-key registration dispatch.
  • Added production block_device shared-storage backend with raw layout superblock, bitmap, directory, extent slots, logical EOF, O_DIRECT validation, and fail-closed startup when required config is absent.
  • Added raw extent allocator serialization using GES X when clustered and local flock() when no cluster owner is available.
  • Added per-handle in-memory extent cache to avoid per-block metadata walks on normal I/O.
  • Added startup raw-layout invariant verification for relation/fork extent ownership overlap defense.
  • Fixed the extend crash window by syncing zero-fill before publishing logical EOF.
  • Added crash-safe RM_CLUSTER_RAW_LAYOUT metadata WAL records, pg_waldump descriptor wiring, and XLOG magic bump for the new rmgr.
  • Fixed startup redo handling for raw-layout metadata writes: relation-WAL replay no longer recursively emits RM_CLUSTER_RAW_LAYOUT, while non-recovery no-WAL metadata writes still fail closed.
  • Wired SYNC_HANDLER_CLUSTER_SHARED and cluster_smgr fsync registration/barrier fallback paths.
  • Added SCSI-3 PR fence-driver interface with Linux SG_IO probe/register helpers; unsupported or unavailable PR capability fails closed when scsi3_pr is forced.
  • Wired storage advisory callbacks for prefetch/writeback/zeroextend and provider-level hooks.
  • Added block-device wait events and updated view/unit/TAP/regress snapshots to 110 cluster wait events.
  • Added raw block-device conformance unit coverage, two-relation raw-layout coverage, single-node crash-restart/WAL-redo coverage, and 2-node owner-agnostic raw-layout storm coverage.
  • Added scripts/perf/run-storage-io-matrix.sh, fast/nightly/perf workflow coverage, and report-only perf evidence behavior.
  • Added docs/cluster/shared-storage-backends.md as the implementation note/amend trail for frozen-spec deltas.
  • Completed CLAUDE.md rule 11 style pass: full pgrac banners for new C/H files and PGRAC MODIFICATIONS/local markers for modified PG-original files.
  • Removed the off-scope recovery/torn-tail net diff from this storage PR; latest diff against origin/main has no changes to the prior recovery files.

D0 Re-ground After PR #14 / spec-6.5

  • Fetched origin/main on 2026-07-01 and confirmed it contains spec-6.5 via merge commit 955d12651b308bd1cc90d33b0202661a325683cf (Merge pull request #14 from sqlrush/spec-6.5-cluster-backup-restore-pitr).
  • Rebased spec-6.0a-production-storage onto that origin/main.
  • Current PR base: 955d12651b308bd1cc90d33b0202661a325683cf.
  • Current PR head: cb5d4206d60baf01e49bff42b288bf3ae47b42b1.
  • PR spec-6.5: cluster-aware backup / restore / PITR substrate #14 / 6.5 fail-closed backup/PITR substrate is treated as main fact. This PR preserves that fail-closed substrate and only replays 6.0a production shared-storage backend changes on top.

D0 Notes

  • Spec requested SQLSTATE slots 58R02/58R03, but origin/main already allocated class 58 through 58R13; this branch uses 58R14 for cluster_storage_io_alignment and 58R15 for cluster_storage_fence_unavailable to preserve the existing dense roster.
  • CI-portable validation uses raw-image block_device coverage. Hardware O_DIRECT block-device and SCSI-3 PR hardware legs remain external/manual before any release decision.
  • The raw backend uses the voting-disk style raw-fd path (BasicOpenFile(..., PG_O_DIRECT)) instead of adding a new fd.c VFD substrate; this is documented in docs/cluster/shared-storage-backends.md.
  • src/bin/pg_waldump/clusterrawdesc.c is a generated symlink produced by the pg_waldump Makefile from src/backend/access/rmgrdesc/clusterrawdesc.c; it is intentionally not tracked.

Local Validation

Post-6.5-rebase local validation:

  • make -s -j4 passed.
  • git diff --check origin/main...HEAD passed.
  • scripts/ci/check-comment-headers.sh passed.
  • scripts/ci/check-scn-cmp-gate.sh passed.
  • scripts/ci/check-ges-mode-gate.sh passed.
  • scripts/ci/check-no-clog-overlay.sh passed.
  • scripts/ci/check-tidy.sh skipped locally because clang-tidy is not installed; GitHub validate installs and runs it.
  • make -C src/test/cluster_unit check passed: 137 cluster unit binaries.
  • make -C src/test/cluster_tap check PROVE_TESTS="t/018_shared_fs.pl t/332_block_device_backend.pl" passed: 23 TAP tests.
  • make -C src/test/cluster_tap check PROVE_TESTS="t/010_views.pl t/030_acceptance.pl t/050_shared_storage_initdb.pl t/200_stage2_acceptance_capability.pl t/226_stage3_mvcc_acceptance_capability.pl t/273_stage4_recovery_acceptance_capability.pl t/332_block_device_backend.pl" passed: 189 TAP tests.
  • make -C src/test/cluster_tap check PROVE_TESTS="t/248_shared_merged_recovery.pl t/274_stage4_recovery_hardgate.pl t/300_cluster_5_50_cr_profile.pl" passed: 94 TAP tests.
  • make -C src/test/cluster_tap check PROVE_TESTS="t/011_gviews.pl t/012_ic.pl t/013_conf.pl t/014_ic_mock.pl t/015_inject.pl t/016_perfmon.pl t/017_debug.pl t/020_shmem_registry.pl t/021_block_format.pl t/022_itl_slot.pl t/023_buffer_descriptor.pl t/108_pcm_state_machine.pl t/110_gcs_loopback.pl t/111_gcs_block_ship_2node.pl t/112_gcs_block_retransmit_2node.pl t/113_gcs_block_2way_2node.pl t/114_gcs_block_3way_3node.pl t/115_gcs_block_3way_3node.pl t/116_gcs_block_lost_write_2node.pl t/117_sinval_broadcast_2node.pl t/118_sinval_ddl_propagation_2node.pl t/203_cluster_tt_status_foundation.pl" passed: 497 TAP tests.
  • make -C src/test/cluster_regress check passed: 12 SQL smoke tests.
  • Clean archive-copy --disable-cluster build passed in /private/tmp/linkdb-disable-src-60a-d6b70e.
  • scripts/perf/run-storage-io-matrix.sh returned unavailable evidence and exit 0 when the default install prefix was absent.
  • Full src/test sweep found no remaining stale wait-event baseline assertions for the old 103 count; shmem region TAP expectations are updated to 69/68 after the 6.5 backup region.

Latest local validation for head cb5d4206d60baf01e49bff42b288bf3ae47b42b1:

  • git diff --check passed.
  • perl -I src/test/perl -I src/test/cluster_tap/lib -c src/test/cluster_tap/t/333_block_device_multinode.pl passed.
  • make -C src/test/cluster_unit check passed: 137 cluster unit binaries.
  • make -C src/test/cluster_tap check PROVE_TESTS="t/332_block_device_backend.pl t/333_block_device_multinode.pl" passed: 19 TAP tests.

Local note:

  • On this macOS workstation, scripts/ci/check-format.sh with clang-format 22 reports existing full-tree formatting differences in non-6.0a cluster files. git diff --check origin/main...HEAD is clean; GitHub fast-gate validate is the authoritative clang-format check for the PR toolchain.
  • An attempted parallel local rerun of cluster_unit and cluster_tap collided on shared tmp_install cleanup; both gates passed when rerun serially.

PR CI

Remaining Merge Gate

  • Blocked from merge by spec-5.19 + 5.21 beta close-out.
  • Final D0 re-ground against then-current origin/main required before merge.
  • Rebase to current origin/main and rerun local required gates, fast PR CI, and nightly CI after final re-ground.

@sqlrush sqlrush force-pushed the spec-6.0a-production-storage branch 3 times, most recently from 079d0ce to 0615752 Compare July 1, 2026 02:01
@sqlrush sqlrush force-pushed the spec-6.0a-production-storage branch from f7ac432 to d6b70e7 Compare July 1, 2026 04:23
@sqlrush sqlrush changed the title draft: spec-6.0a production shared-storage backend matrix spec-6.0a production shared-storage backend matrix Jul 1, 2026
@sqlrush sqlrush marked this pull request as ready for review July 1, 2026 08:30
@sqlrush sqlrush force-pushed the spec-6.0a-production-storage branch from 57c6e42 to cb5d420 Compare July 1, 2026 09:24
@sqlrush sqlrush merged commit 2413df1 into main Jul 1, 2026
20 checks passed
@sqlrush

sqlrush commented Jul 1, 2026

Copy link
Copy Markdown
Owner Author

Merged to main.

  • Merge commit: 2413df103c474089e870162a8e8d03c5c3913ccf
  • Main head after merge: 2413df103c474089e870162a8e8d03c5c3913ccf
  • PR head merged: cb5d4206d60baf01e49bff42b288bf3ae47b42b1
  • Verified: PR head is an ancestor of origin/main
  • Remote branch retained: origin/spec-6.0a-production-storage

No tag, no shipped declaration, no release flow performed. Downstream 6.1/6.4/6.5 success-path work depending on 6.0a main availability is now unblocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant